install.packages(c("tidyverse",
"devtools",
"dataverse",
"here",
"glue",
"ggtext",
"skimr",
"MetBrewer",
"scales",
"maps",
"mapdata",
"leaflet",
"sf",
"htmltools",
"patchwork",
"ggh4x",
"geofacet"))MAIR/APAN Workshop 2024
Last week’s election saw surprising swings away from the Democrats at the Presidential level, including among many blue stronghold states. It is useful to place these political movements within the broader historical context of US Presidential elections. Today, we will use R to collect, explore, and communicate data on past US Presidential Elections. We will use the stories that emerge from trends in these data to develop our understanding of the most recent election and those to come.
We will start by collecting the relevant data using an API.
We will then clean up and explore these data, readying ourselves for some analysis.
We will explore various ways of visualizing these data, including creating interactive plots and maps.
Finally, we will explore tools in R to communicate our findings.
Set up
To follow along with this workshop, you will need:
The latest version of R and RStudio
A new RProject
This
workshop.qmdfile (which you can download from https://github.com/hgoers/mair_apan_workshop_2024)The following R packages installed on to your local device:
If you hover over a code chunk, you should see a little copy icon appear in the top right hand corner. Clicking on that copies the code in the code chunk. You can then easily paste this in your own script.
Collecting your data
We will start by collecting data on historical US Presidential state-level returns. The MIT Election Data + Science Lab provides these returns to the public. To access their data, we will use the Harvard Dataverse API. We will access it using the R package dataverse.
The dataverse package accesses the Harvard Dataverse Application Programming Interface (API) in the background. To learn more about how to access this API directly, check out the Harvard Dataverse documentation.
library(tidyverse)
library(dataverse)We will be accessing the U.S. President 1976–2020 data set. To do this, we need three pieces of information:
The name of the file we want to download
The data set’s DOI
The data set’s format
We can get all of this information from the data set’s page on the Harvard Dataverse website.
state_results_df <- get_dataframe_by_name(
filename = "1976-2020-president.tab",
dataset = "10.7910/DVN/42MVDX",
server = "dataverse.harvard.edu",
original = T,
.f = readr::read_csv
)This code programmatically pulls the most up-to-date data set from the API. Once the MIT Election Lab publishes the 2024 Presidential Election data, you will only need to update the file name to be able to access it.
Saving your data
You should always save a copy of your data locally, particularly when those data are updated regularly. The results of your analysis may change with updates to the raw data. You want to be able to track those changes (and uncover why they occurred).
The here R package makes saving data in a robust way very easy. Using here::here(), you can automatically update file paths to reflect the computer you are currently using. To demonstrate, run the following code:
here::here()[1] "/Users/harrietgoers/Documents/mair_apan_workshop_2024"
Your output will be different to mine. here::here() locates the top level of the file directory in which you are located. This is helpful when you share your code with others (or switch to a new computer). Hard coded file paths will cause annoying errors.
I like to save my raw data in a folder called data-raw. Go ahead a make a similar folder in your RProject.
To point here::here() to this folder, you simply supply it with the sub-folder name(s) as a string:
here::here("data-raw")[1] "/Users/harrietgoers/Documents/mair_apan_workshop_2024/data-raw"
I am going to save a copy of the latest US Presidential Election results. I am going to include in the file name the date that I pulled the data. This stops me from overwriting previous versions of the data. I need those previous versions if I want to track changes made.
I use glue::glue() to evaluate my R code (enclosed in those curly brackets) and convert the output into a string.
I use lubridate::today() (loaded in with the library(tidyverse) command) to get today’s date.
write_csv(state_results_df,
here::here("data-raw", glue::glue("pres_state_results_{today()}.csv")))Reading in the latest version of your data
To make sure you are reading in the latest version of your data, you can run the following:
state_results_df <- list.files(path = here::here("data-raw"),
pattern = "pres_state_results_.*",
full.names = T) |>
max() |>
read_csv()Exploring your data
We now have the latest published data on the number of votes each US Presidential candidate received in all elections from 1976 to 2020. Let’s take a good look at it.
First, I want to get a quick overview of my data. I like to use skimr::skim() to do this:
skimr::skim(state_results_df)| Name | state_results_df |
| Number of rows | 4287 |
| Number of columns | 15 |
| _______________________ | |
| Column type frequency: | |
| character | 6 |
| logical | 2 |
| numeric | 7 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| state | 0 | 1.00 | 4 | 20 | 0 | 51 | 0 |
| state_po | 0 | 1.00 | 2 | 2 | 0 | 51 | 0 |
| office | 0 | 1.00 | 12 | 12 | 0 | 1 | 0 |
| candidate | 287 | 0.93 | 4 | 38 | 0 | 269 | 0 |
| party_detailed | 456 | 0.89 | 3 | 40 | 0 | 172 | 0 |
| party_simplified | 0 | 1.00 | 5 | 11 | 0 | 4 | 0 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| writein | 3 | 1 | 0.11 | FAL: 3807, TRU: 477 |
| notes | 4287 | 0 | NaN | : |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1 | 1999.08 | 14.22 | 1976 | 1988 | 2000 | 2012.0 | 2020 | ▇▅▅▅▇ |
| state_fips | 0 | 1 | 28.62 | 15.62 | 1 | 16 | 28 | 41.0 | 56 | ▇▇▇▇▇ |
| state_cen | 0 | 1 | 53.67 | 26.03 | 11 | 33 | 53 | 81.0 | 95 | ▆▆▇▃▇ |
| state_ic | 0 | 1 | 39.75 | 22.77 | 1 | 22 | 42 | 61.0 | 82 | ▇▆▇▇▅ |
| candidatevotes | 0 | 1 | 311907.59 | 764801.10 | 0 | 1177 | 7499 | 199241.5 | 11110250 | ▇▁▁▁▁ |
| totalvotes | 0 | 1 | 2366924.15 | 2465008.31 | 123574 | 652274 | 1569180 | 3033118.0 | 17500881 | ▇▂▁▁▁ |
| version | 0 | 1 | 20210113.00 | 0.00 | 20210113 | 20210113 | 20210113 | 20210113.0 | 20210113 | ▁▁▇▁▁ |
Checking your data set is complete
Let’s make sure that we have all years covered. US Presidential elections occur every four years, so we should have 12 unique years in our data set. Let’s confirm that:
state_results_df |>
distinct(year)# A tibble: 12 × 1
year
<dbl>
1 1976
2 1980
3 1984
4 1988
5 1992
6 1996
7 2000
8 2004
9 2008
10 2012
11 2016
12 2020
Great! I wonder if the number of presidential candidates has changed over time. Let’s see how many people ran each year:
n_candidates_annual <- state_results_df |>
distinct(year, candidate) |>
drop_na() |>
count(year)
n_candidates_annual# A tibble: 12 × 2
year n
<dbl> <int>
1 1976 15
2 1980 21
3 1984 17
4 1988 18
5 1992 25
6 1996 24
7 2000 20
8 2004 23
9 2008 27
10 2012 35
11 2016 32
12 2020 102
This is a bit unwieldy. Let’s visualize it to get a better sense:
ggplot(n_candidates_annual, aes(x = year, y = n)) +
geom_line(linewidth = 1) +
geom_point() +
theme_minimal() +
theme(plot.title = element_text(face = "bold"),
plot.title.position = "plot") +
labs(title = glue::glue("Number of people running for US President, {min_year} - {max_year}"),
x = "Year",
y = NULL,
caption = "Source: MIT Election Lab")Hmm, that’s a shocking spike in 2020. What happened? Let’s take a closer look at the data.
First, let’s confirm that these are indeed individual candidates:
state_results_df |>
filter(year == 2020) |>
distinct(candidate) |>
drop_na()# A tibble: 102 × 1
candidate
<chr>
1 "BIDEN, JOSEPH R. JR"
2 "TRUMP, DONALD J."
3 "JORGENSEN, JO"
4 "DE LA FUENTE, ROQUE \"\"ROCKY\"\""
5 "PIERCE, BROCK"
6 "JANOS, JAMES G. \"JESSE VENTURA\""
7 "BLANKENSHIP, DON"
8 "LA RIVA, GLORIA ESTELLA"
9 "CUMMINGS, DANIEL CLYDE"
10 "SIMMONS, JADE"
# ℹ 92 more rows
Let’s then check whether one party is driving this:
party_count_annual <- state_results_df |>
distinct(year, party_simplified, candidate) |>
drop_na(candidate) |>
count(year, party_simplified)ggplot(party_count_annual, aes(x = year, y = n, colour = party_simplified)) +
geom_line() +
geom_point() +
theme_minimal() +
theme(plot.title = element_text(face = "bold"),
plot.title.position = "plot",
legend.position = "top") +
labs(title = glue::glue("Number of people running for US President, {min_year} - {max_year}"),
x = "Year",
y = NULL,
colour = "Party",
caption = "Source: MIT Election Lab") +
scale_color_manual(values = MetBrewer::met.brewer("Homer1"))It does appear that the number of “Other” candidates is unusually large (and has been driving the general trend of increasing numbers of candidates over the years). The other large party groups - the Democrats, Republicans, and Libertarians - appear to have fielded their usual number of candidates.
Could this be driven by a large number of write-in candidates?
write_in_count_annual <- state_results_df |>
drop_na(candidate, writein) |>
distinct(year, writein, candidate) |>
count(year, writein)
ggplot(write_in_count_annual, aes(x = year, y = n, colour = writein)) +
geom_line() +
geom_point() +
theme_minimal() +
theme(plot.title = element_text(face = "bold"),
plot.title.position = "plot",
legend.position = "top") +
labs(title = glue::glue("Number of people running for US President, {min_year} - {max_year}"),
x = "Year",
y = NULL,
colour = "Party",
caption = "Source: MIT Election Lab") +
scale_color_manual(values = MetBrewer::met.brewer("Homer1"))Partially, yes. There is a huge spike in the number of candidates written in to the ballot. There also appears to be an increase in the number of candidates on the ballot; however, this appears to be following a general trend across time.
Note that there are some years in which no write-in candidates emerged (1976 to 1984, 1992, 1996, and 2012). The way we calculated the number of write-in and non-write-in candidates above leaves these years as NAs, not zero counts. To fix this, do the following:
# Create a scope data frame
scope_df <- state_results_df |>
distinct(year) |>
crossing(writein = c(T, F))
# Append your calculations to that scope
write_in_count_annual_full <- scope_df |>
left_join(write_in_count_annual, by = join_by(year, writein)) |>
mutate(n = replace_na(n, 0))
ggplot(write_in_count_annual_full, aes(x = year, y = n, colour = writein)) +
geom_line() +
geom_point() +
theme_minimal() +
theme(plot.title = element_text(face = "bold"),
plot.title.position = "plot",
legend.position = "top") +
labs(title = glue::glue("Number of people running for US President, {min_year} - {max_year}"),
x = "Year",
y = NULL,
colour = "Party",
caption = "Source: MIT Election Lab") +
scale_color_manual(values = MetBrewer::met.brewer("Homer1"))Could this be explained by one (or a small group of states)?
state_count_annual <- state_results_df |>
distinct(year, state_po, candidate) |>
drop_na(candidate) |>
count(year, state_po)ggplot(state_count_annual, aes(x = year, y = n)) +
geom_line() +
geom_point() +
facet_wrap(~ state_po) +
theme_minimal() +
theme(plot.title = element_text(face = "bold"),
plot.title.position = "plot") +
labs(title = glue::glue("Number of people running for US President, {min_year} - {max_year}"),
x = "Year",
y = NULL,
colour = "Party",
caption = "Source: MIT Election Lab") +
scale_x_continuous(breaks = seq(min_year, max_year, 20))Can anyone see any commonalities between the states in which there were large spikes in the number of candidates running?
Focusing on the major parties
We want to place this year’s Presidential election within its historical context. We have explored our data to get a general understanding of those trends, but we now want to formalize them.
To begin, let’s focus our data:
pres_state_results_df <- state_results_df |>
filter(
# Isolate DEM and REP from other party candidates
party_simplified %in% c("DEMOCRAT", "REPUBLICAN"),
# Remove other candidates
!candidate %in% c("OTHER", NA_character_)
) |>
# Get the total count of votes cast for other candidates, preserving
# the total vote count for the state and the candidate
group_by(year, state, state_po, party_simplified, candidate, totalvotes) |>
summarise(candidatevotes = sum(candidatevotes)) |>
# Always remember to ungroup your data
ungroup() |>
# Calculate the proportion of votes cast in each state won by each
# candidate
mutate(prop_votes = candidatevotes / totalvotes)
pres_state_results_df# A tibble: 1,224 × 8
year state state_po party_simplified candidate totalvotes candidatevotes
<dbl> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1976 ALABAMA AL DEMOCRAT CARTER, … 1182850 659170
2 1976 ALABAMA AL REPUBLICAN FORD, GE… 1182850 504070
3 1976 ALASKA AK DEMOCRAT CARTER, … 123574 44058
4 1976 ALASKA AK REPUBLICAN FORD, GE… 123574 71555
5 1976 ARIZONA AZ DEMOCRAT CARTER, … 742719 295602
6 1976 ARIZONA AZ REPUBLICAN FORD, GE… 742719 418642
7 1976 ARKANSAS AR DEMOCRAT CARTER, … 767535 498604
8 1976 ARKANSAS AR REPUBLICAN FORD, GE… 767535 267903
9 1976 CALIFORN… CA DEMOCRAT CARTER, … 7803770 3742284
10 1976 CALIFORN… CA REPUBLICAN FORD, GE… 7803770 3882244
# ℹ 1,214 more rows
# ℹ 1 more variable: prop_votes <dbl>
Quickly check to make sure everything looks correct. Are there any outliers in votes cast?
ggplot(pres_state_results_df, aes(x = totalvotes)) +
geom_boxplot()Our data are skewed. This is likely because there are large differences in states’ populations, with a few states having very large populations. Let’s check that:
pres_state_results_df |>
distinct(year, state, totalvotes) |>
# Account for changes to population counts over the decades
group_by(year) |>
# Cut the data into five groups based on vote
mutate(vote_group = cut_interval(totalvotes, 5,
labels = c("Rare",
"Medium rare",
"Medium",
"Well done",
"Burnt"))) |>
ungroup() |>
filter(vote_group %in% c("Well done", "Burnt")) |>
distinct(state)# A tibble: 4 × 1
state
<chr>
1 CALIFORNIA
2 NEW YORK
3 FLORIDA
4 TEXAS
These are the states with the four largest populations in the country. Those “outliers” are just an artifact of the distribution of people across America.
Can you think of anything else we should check in these data?
Tracking proportion of votes won by state and over time
Now that we have cleaned and checked our data, we can answer our initial research question: What proportion of votes did the Democratic and Republican candidates win in each state in each year?
Let’s play around with the 2020 Presidential election results:
pres_state_results_2020 <- pres_state_results_df |>
filter(year == 2020)Let’s look at some different ways to plot the results by state. First, let’s plot the proportion of votes won by each candidate in each state in a way that makes the comparison easy to see.
pres_state_results_2020 |>
mutate(prop_votes = if_else(party_simplified == "REPUBLICAN", -prop_votes, prop_votes)) |>
ggplot(aes(x = prop_votes, y = reorder(state, prop_votes),
fill = party_simplified)) +
geom_col() +
geom_vline(xintercept = c(-0.5, 0.5), colour = "grey") +
theme_minimal() +
theme(legend.position = "none",
plot.title = ggtext::element_markdown(face = "bold"),
plot.title.position = "plot") +
labs(title = "Percentage of votes won by <span style='color:#4672AB;'>Joe Biden</span> and <span style='color:#AA4841;'>Donald Trump</span> in the 2020 US <br>Presidential Election",
x = "Percentage of votes won",
y = NULL) +
scale_fill_manual(values = c("#4672AB", "#AA4841")) +
scale_x_continuous(breaks = c(-1, -0.5, 0.5, 1),
limits = c(-1, 1),
labels = c("-1" = "100%",
"-0.5" = "50%",
"0.5" = "50%",
"1" = "100%"))Next, we can visualize the difference in the proportion of votes won by each candidate:
prop_won_2020 <- pres_state_results_2020 |>
# Make sure the order of the parties is consistent
arrange(state, party_simplified) |>
group_by(state) |>
mutate(prop_adv = prop_votes - lead(prop_votes)) |>
ungroup() |>
drop_na(prop_adv)
ggplot(prop_won_2020, aes(x = prop_adv, y = reorder(state, prop_adv),
fill = prop_adv < 0)) +
geom_col() +
theme_minimal() +
theme(legend.position = "none",
plot.title = ggtext::element_markdown(face = "bold"),
plot.title.position = "plot") +
labs(title = "Advantage won by <span style='color:#4672AB;'>Joe Biden</span> and <span style='color:#AA4841;'>Donald Trump</span> in the 2020 US Presidential <br>Election",
x = "Percentage of votes won",
y = NULL) +
scale_fill_manual(values = c("#4672AB", "#AA4841")) +
scale_x_continuous(labels = scales::percent,
limits = c(-1, 1))Finally, we can plot this on a map:
library(maps)
library(mapdata)
us_states <- map_data("state")
prop_won_sf <- us_states |>
left_join(mutate(prop_won_2020, region = str_to_lower(state)),
by = join_by(region))
ggplot(prop_won_sf, aes(x = long, y = lat, fill = prop_adv, group = group)) +
geom_polygon(color = "lightgrey") +
guides(fill = FALSE) +
theme_void() +
coord_fixed(1.3) +
scale_fill_gradient2(low = "#AA4841",
high = "#4672AB",
midpoint = 0)You can even make this interactive:
library(leaflet)
library(sf)
library(htmltools)
us_states <- read_sf("https://rstudio.github.io/leaflet/json/us-states.geojson")
prop_won_sf <- us_states |>
mutate(state = str_to_upper(name)) |>
left_join(prop_won_2020, by = join_by(state))
mypalette <- colorNumeric(
palette = c("#AA4841", "white", "#4672AB"),
domain = c(-1, 1),
na.color = "transparent"
)
labels <- glue::glue("{prop_won_sf$state}: {scales::percent(prop_won_sf$prop_adv, accuracy = 0.1)}")
leaflet(prop_won_sf) |>
setView(-96, 37.8, 4) |>
addPolygons(fillColor = ~ mypalette(prop_adv),
weight = 2,
color = "white",
fillOpacity = 1,
highlightOptions = highlightOptions(weight = 4,
color = "darkgrey",
fillOpacity = 0.7,
bringToFront = T),
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto"))Everyone loves a map, but you should use them cautiously. They can be misleading. Large states (by area) are overemphasized. Small states (and DC) can be very difficult to see. Because the importance of a state’s votes to the outcome of the Presidential election is not proportional to its geographic size, this can obscure the outcome you are trying to communicate.
Looking at trends over time
We are most interested in learning about how support for each party’s Presidential candidate has changed over time. Let’s start to explore these trends.
First, let’s look which party’s candidate won the popular vote over time:
pres_national_yearly <- pres_state_results_df |>
group_by(year, party_simplified) |>
summarise(candidatevotes = sum(candidatevotes),
totalvotes = sum(totalvotes),
prop = candidatevotes / totalvotes) |>
ungroup()p1 <- ggplot(pres_national_yearly, aes(x = year, y = prop, colour = party_simplified)) +
geom_line(linewidth = 1) +
geom_point() +
theme_minimal() +
theme(legend.position = "none",
plot.title = ggtext::element_markdown(face = "bold"),
plot.title.position = "plot") +
labs(title = glue::glue("Proportion of total votes cast won by the <span style='color:#4672AB;'>Democractic</span> and <span style='color:#AA4841;'>Republican</span> <br>candidate in US Presidential elections from {min_year} to {max_year}"),
x = NULL,
y = NULL) +
scale_colour_manual(values = c("#4672AB", "#AA4841")) +
scale_y_continuous(labels = scales::percent)
p1It would be nice to add some information about which candidate ran in each of these years:
library(patchwork)
p2 <- pres_state_results_df |>
distinct(year, party_simplified, candidate) |>
ggplot(aes(x = year, y = reorder(candidate, year), colour = party_simplified)) +
geom_point() +
geom_line(linewidth = 1) +
theme_minimal() +
theme(legend.position = "none") +
labs(x = "Year",
y = NULL) +
scale_colour_manual(values = c("#4672AB", "#AA4841"))
p1 / p2Let’s focus on individual states. We will start with the seven swing states everyone was tracking last Tuesday night:
swing_states_2024 <- c("AZ", "GA", "MI", "NV", "NC", "PA", "WI")
pres_state_results_df |>
filter(state_po %in% swing_states_2024) |>
ggplot(aes(x = year, y = prop_votes, colour = party_simplified)) +
geom_line() +
geom_point() +
facet_wrap(~ state, nrow = 2) +
theme_minimal() +
theme(legend.position = "none") +
labs(x = "Year",
y = NULL) +
scale_colour_manual(values = c("#4672AB", "#AA4841"))This is good, but we can improve the reader’s ability to quickly grasp swings in these states by highlighting the winning party more explictly:
library(ggh4x)
pres_adv_states <- pres_state_results_df |>
pivot_wider(names_from = party_simplified, values_from = prop_votes) |>
group_by(year, state, state_po) |>
summarise(DEMOCRAT = sum(DEMOCRAT, na.rm = T),
REPUBLICAN = sum(REPUBLICAN, na.rm = T)) |>
ungroup()
pres_adv_states |>
filter(state_po %in% swing_states_2024) |>
ggplot(aes(x = year)) +
geom_line(aes(y = DEMOCRAT, colour = "#4672AB")) +
geom_line(aes(y = REPUBLICAN, colour = "#AA4841")) +
facet_wrap(~ state, nrow = 2) +
stat_difference(aes(ymin = REPUBLICAN, ymax = DEMOCRAT), alpha = 0.3) +
scale_colour_identity() +
scale_fill_manual(values = c("#4672AB", "#AA4841")) +
theme_minimal() +
theme(legend.position = "none",
plot.title = element_text(face = "bold"),
plot.title.position = "plot") +
labs(title = "Proportion of total votes cast won by each party's candidate in the 2024 US \nPresidential Election swing states",
x = "Year",
y = NULL)We can do something similar for each party’s safest seats in th 1976 US Presidential Election:
dem_safest_5 <- pres_state_results_df |>
filter(year == 1976, party_simplified == "DEMOCRAT") |>
slice_max(prop_votes, n = 5)
pres_adv_states |>
filter(state_po %in% dem_safest_5$state_po) |>
ggplot(aes(x = year)) +
geom_line(aes(y = DEMOCRAT, colour = "#4672AB")) +
geom_line(aes(y = REPUBLICAN, colour = "#AA4841")) +
facet_wrap(~ state, nrow = 2) +
stat_difference(aes(ymin = REPUBLICAN, ymax = DEMOCRAT), alpha = 0.3) +
scale_colour_identity() +
scale_fill_manual(values = c("#4672AB", "#AA4841")) +
theme_minimal() +
theme(legend.position = "none",
plot.title = element_text(face = "bold"),
plot.title.position = "plot") +
labs(title = "Proportion of total votes cast won by each party's candidate in the Democrat's \nsafest states in the 1976 Presidential Election",
x = "Year",
y = NULL)rep_safest_5 <- pres_state_results_df |>
filter(year == 1976, party_simplified == "REPUBLICAN") |>
slice_max(prop_votes, n = 5)
pres_adv_states |>
filter(state_po %in% rep_safest_5$state_po) |>
ggplot(aes(x = year)) +
geom_line(aes(y = DEMOCRAT, colour = "#4672AB")) +
geom_line(aes(y = REPUBLICAN, colour = "#AA4841")) +
facet_wrap(~ state, nrow = 2) +
stat_difference(aes(ymin = REPUBLICAN, ymax = DEMOCRAT), alpha = 0.3) +
scale_colour_identity() +
scale_fill_manual(values = c("#AA4841", "#4672AB")) +
theme_minimal() +
theme(legend.position = "none",
plot.title = element_text(face = "bold"),
plot.title.position = "plot") +
labs(title = "Proportion of total votes cast won by each party's candidate in the Republican's \nsafest states in the 1976 Presidential Election",
x = "Year",
y = NULL)In fact, we can do this for all US states and present them in a geographically correct way using the geofacet R package:
library(geofacet)
ggplot(pres_adv_states, aes(x = year)) +
geom_line(aes(y = DEMOCRAT, colour = "#4672AB")) +
geom_line(aes(y = REPUBLICAN, colour = "#AA4841")) +
facet_geo(~ state_po) +
stat_difference(aes(ymin = REPUBLICAN, ymax = DEMOCRAT), alpha = 0.3) +
scale_colour_identity() +
scale_fill_manual(values = c("#4672AB", "#AA4841")) +
theme_minimal() +
theme(axis.text = element_blank(),
legend.position = "none",
plot.title = ggtext::element_markdown(face = "bold"),
plot.title.position = "plot") +
labs(title = glue::glue("Proportion of total votes cast won by the <span style='color:#4672AB;'>Democractic</span> and <span style='color:#AA4841;'>Republican</span> <br>candidate in US Presidential elections from {min_year} to {max_year}"),
x = NULL,
y = NULL)Pretty cool! We can see changes to each party’s position in each state over time. We can see from the relative widths of these plots which states are more consistently on party’s stronghold, and which tend to be relatively close races.
Next, we are going to pick an interesting story from these data and learn how to communicate it in an engaging and replicable way with R.
I will shortly be referring to a new language called markdown. If you are not familiar with markdown, I highly recommend this short tutorial: https://www.markdowntutorial.com/.